Introduction

Welcome to my Computational Musicology portfolio for 2025! This storyboard contains my perspective on the examples from each week, presenting an exploration of AI- and human- generated tracks through various visualizations implemented using R and flexdashboard.

My submitted tracks were entirely AI-generated, and were used, alongside the class corpus, to extract insightful trends, patterns, and takeaways from AI- and human- generated music. The dataset includes Essentia track-level features, including arousal, danceability, intrumentalenss, tempo, and valence. Key insights are drawn from the tracks’ detailed spectral, temporal, and harmonic characteristics through visualizations such as chromograms, cepstrograms, self-similarity charts, keygrams, chordograms, tempograms, histograms, and energy novelty charts.

Personal Tracks

I decided on exploring different genres that I like by asking different models for various genres of songs. I have always been a fan of post-punk, alternative rock music from the late 90s to early 2000s, such as Interpol, The Strokes, Bloc Party, Arctic Monkeys, Fontaines D.C. With the help of AI, I came up with this description to use as a prompt for gen AI music models: a high-energy alt rock / post-punk song with a melodic bassline, intricate drumming, and sharp and rhythmic guitar work, reminiscent of the bands Interpol and Bloc Party. dynamic, with tension-building verses leading into an explosive, anthemic chorus. create a sense of depth and intensity.” I also used this shortened version for models with character limits “Post-Punk, Driving Melodic Bassline, Angular Reverb-Drenched Guitars, Punchy Dynamic Drumming, Moody Detached Vocals, Urgent & Anthemic, Dark Yet Energetic, Tension-Building Composition, 140 BPM”. I have also recently been enjoying deep house music, so I decided to choose this genre as one of my songs. I used the following prompts, “deep house song that has a hypnotic beat, gradually layering warm synths, deep basslines, and subtle percussive beats, with a steady, entrancing rhythm. slow, cinematic build-ups that evoke nostalgia and euphoria. Incorporate atmospheric pads and a shimmering, time-dissolving feel of the track, with immersive, and emotionally uplifting verses and bridges, suitable for a sunset in the mountains” and “Deep House, Hypnotic Synth Pads, Pulsing Bassline, Rolling Four-on-the-Floor Groove, Atmospheric Textures, Slow-Building Progression, Dreamy Vocal Samples, Cinematic, Nostalgic, Expansive, 120 BPM”. I wanted to try a different genre that I also enjoy, something along the lines of Lana del Rey’s style. So I used the following prompts, “A cinematic baroque / dream pop composition that blends dreamy electronic synths with classical orchestration. Feature violins, melancholic clarinets, and rich trumpet swells, weaving through ethereal synth pads and delicate, reverberated piano. The rhythm should be slow and hypnotic, with a hazy, dreamlike quality. The vocals should be intimate yet grand, drenched in vintage-style reverb, with poetic, melancholic lyrics evoking themes of romance, nostalgia, and faded Hollywood glamour. Think of Lana Del Rey’s storytelling style, but with a modern dream pop twist—layered harmonies, sweeping crescendos, and an air of cinematic longing .” and “Baroque Pop / Dream Pop, Ethereal Synth Pads, Sweeping Violin & Clarinet Arrangements, Melancholic Trumpet Swells, Reverb-Drenched Intimate Vocals, Vintage Aesthetic, Poetic & Nostalgic, Cinematic & Grand, 80 BPM”. I explored the outputs of these prompts from various models including Suno, Stable Audio, Beatoven.ai, Soundverse.ai, Udio, and Mubert. Although I was hesitant to try Suno and Udio given their use of artists’ music without compensating them, I wanted to see whether there would be any differences in the quality, production output, relevance to the prompt, and similarity to expectations and existing songs.I found the vocal and lyrical qualities of most models to be of somewhat lower quality than I was expecting, with many songs sounding unnatural or AI generated (understandably).

My First Track
Description:

I ended up deciding on the Stable Audio deep house track for my first track because it seemed to best match my expectations of emotive, intense, while also calming and not sounding too elaborate.

My Second Track
Description:

For my second track, I ended up deciding to go with another deep house song that I generated on Suno.

Visualising the AI Song Contest


This is the bad visualisation of the AI Song Contest we used in our first lab session, this time in a dashboard.

Visualising the AI Song Contest 2


To improve and build upon the first visualization, I sought to formulate a story by improving the look of the visualization, making it more readable. I went about doing this by:

  1. Removing elements like geom_rug() that did not add much value
  2. Adjusting labels, font sizes, and layout to improve clarity
  3. Adding trend lines or facets for better pattern recognition, enhancing comparisons
  4. Using the size and color variables in a meaningful way

This updated version:

✅ Clearly shows tempo vs. arousal trends
✅ Uses color and size effectively
✅ Highlights overall trends with a dashed trend line
✅ Has a clean and readable layout

compmus2025 corpus

Chromogram for my 2 songs


I modified the template code by:

  1. Changing the norm parameter, which affects how the chroma features are normalized: I chose the manhattan norm to retain musical structure
  2. Changing the theme and color scales: For personalization and variation on the visual clarity, I used the rocket coolor scale and the light theme option

A side-by-side analysis of the chromograms of my two tracks reveals the distribution of the 12 musical pitch classes of each song. A chromagram represents pitch class content regardless of octave, making it useful for identifying harmonic structure and key. Track 1 shows relatively high instances of the F#, F, and E pitch classes, with lower instances of the C class. Track 2 shoqws common occurrence of the A#, F#, D#, and C# pitch classes. The first track seems more consistent in terms of chroma distribution across time, as it is a deep house song with a consistent build up, while track 2 has more irregular fluctuations, possibly indicating more dynamic harmonic shifts or different instrumentation, as the track is more upbeat, joyful, and playful.

Cepstrogram


This cepstrogram of track 1 reveals a visual representation of the cepstrum of a signal over time, used for timbre analysis. It works by: computing a spectrogram (amplitude of a Fourier transform), then applying a second inverse Fourier Transform (cepstrum) - which is the result of the logarithm of the estimated signal spectrum

I modified the template code by:

  1. Changing the norm parameter, which affects how the chroma features are normalized: I chose the euclidean norm, suited for high-dimensional data and to emphasize clarity.
  2. Changing the theme and color scales: For personalization and variation on the visual clarity, I used the rocket color scale and the dark theme option

Chroma-based self-similarity

Timbre-based self-similarity


Chroma- vs Timbre- based self-similarity for track 1:

Timbre features, often represented by MFCCs (Mel-Frequency Cepstral Coefficients), capture the spectral characteristics of the sound. This timbre-based self-similarity chart highlights instrumental changes and overall sound quality / production shifts.The effectiveness of chroma- or timbre-based self-similarity for structural analysis depends on the specific characteristics of the track: While the chroma-based self-similarity captures harmonic progressions, tonal structure, key changes, and chord progressions (providing clearer structural pictures for tracks with harmonic content such as pop , jazz and classical music), timbre-based self-similarity captures instrumental texture and sound quality, outlining changes in orchestration, dynamics, and articulation. Because my chosen track is a more electronic / EDM song, its timbre features are at the forefront, making the timbre-based self-similarity chart more insightful. The timbre-based self-similarity chart portrays the tracks repeated instrumental sections through the prominent diagonal lines with sudden shifts indicating timbral changes (the introduction of new instruments)

Keygram


Using the 1–0 coding for the chord templates, I generated a Keygram for my first track. This Keygram makes use of the new helper function compmus_match_pitch_templates, which compares the averaged chroma vectors against templates. Generally, a keygram shows the progression of chords over time by matching chroma features (pitch class profiles) to predefined chord templates. The visualization represents which chords are most likely at each point in time. For instance, for the chosen track, dark colors, such as at the start of the track and around the 70-80 second range, represent short distances / differences between the recorded chords and the template. keygrams help identify harmonic progressions, modulations, and changes in harmony over time.

Keygram using Temperley’s proposed improvements


This is the same keygram generateed using Temperley’s proposed improvements. It reveals more or less similar insights, but generally, Temperely’s improvements imply clearer or more stable key regions, assign higher weights to stable scale degrees (tonic, dominant), and reflect more natural tonal hierarchies

Chordogram


A chordogram is a visual representation of the harmonic structure of a song over time. It maps how different chords are present throughout the track. The x-axis represents time (in seconds), while the y-axis lists different chord templates. The color intensity indicates how strongly a given chord is detected at a particular time, with darker shades showing stronger matches. The darker bands suggest that chords such as :maj, G major, F major, and D minor are frequently occurring. The song features dynamic shifts, where may chord changes can be heard in more rhythmic sections, such as at 47 seconds where visible variations in chord intensity appear.

tempograms for track 1


A tempogram is a time-tempo representation that encodes the local tempo of a music signal over time. It provides a visual representation of how the tempo of a piece of music changes throughout its duration. Tempograms are useful for analyzing and understanding the rhythmic structure and tempo variations in musical compositions. The regular tempogram shows several tempo harmonics (observable lines at bpm values of 150, 220, 380, 500, 600). This suggests that the music contains strong rhythmic subdivisions, reinforcing multiple layers of the base tempo. The high presence of harmonics suggests a steady and rhythmic beat structure, likely because it is an electronic / techno track. The lack of tempo variation over time indicates that the tempo remains stable throughout the track. The brighter regions (e.g., near 100 seconds) show moments where certain tempo components are dominant, reflecting rhythmic changes or intensifications, which are confirmed after reviewing the track. On the other hand, the cyclic tempogram is limited to a cyclic range (80–160 BPM), which focuses on musically relevant tempos and avoids higher harmonics. It reveals a dominant tempo of around 120 bpm, with weaker subharmonics at around 90 & 150 bpm.

The regular tempogram captures broader harmonic structures, while the cyclic tempogram focuses on musically significant tempo ranges, making it easier to identify the main tempo.

Class Corpus Tempi Histogram


I decided to create a histogram of the class corpus to observe the most common tempi. There is a clear peak around 90-100 BPM, suggesting that this is the most frequent tempo range in the class corpus. It is also interesting that there are multiple peaks, suggesting that there is a diverse set of tempos rather than a single dominant one. There are also some tracks in the lower tempo range (below 50 BPM), possibly due to half-time interpretations.

Energy & Spectral Novelty Track 1


The `compmus_energy_novelty() function estimates novelty based on sudden changes in loudness over time. It detects significant shifts in energy levels, which are useful for identifying musical onsets and transitions. For this particular track, there is an evident onset at approximately 10 seconds, with smaller onsets at 115 and 125 seconds.

The compmus_spectral_novelty() function approximates spectral novelty by analyzing cepstrograms, which represent changes in the frequency content of a signal, detecting harmonic or timbral shifts better than energy-based novelty. Because this visualization seems more consistent that that of the energy novelty, it can be deduced that there is less spectral variation and more energy novelty.

Clustering

Heatmaps

Revised Random Forest Classification


Revised Elements:

Instead of using valence and tempo, the plot now displays instrumentalness (x-axis) and arousal (y-axis), which were identified as more important features by the random forest model. Danceability is used for point size, suggesting it plays a significant role in distinguishing AI vs. Non-AI music.

Analysis:

There is a negative correlation between instrumentalness (x-axis) and arousal (y-axis). As instrumentalness increases, arousal tends to decrease. This suggests that tracks with more instrumental content tend to be less energetic.

AI-generated tracks appear more concentrated at higher instrumentalness values (0.7-0.9) and lower arousal values. Non-AI tracks are more spread out, with some occurring at lower instrumentalness values and higher arousal values. This suggests that AI music may favor more instrumental, low-energy compositions.

Danceability is mapped to point size, showing variation across AI and Non-AI tracks. Larger points (high danceability) are spread throughout, meaning danceability does not show a strong trend with instrumentalness or arousal.

It can be interpreted that AI-generated music might be more predictable and structured, focusing on instrumental, low-energy tracks, while Non-AI music covers a broader range, possibly due to human creativity and varied emotional expression.

Confusion Matrix


The confusion chart on the left represents the performance of a classifier attempting to distinguish between AI-generated and non-AI-generated music. Using a nearest-neighbor classifier, the most important features selected for classification were: Instrumentalness, Danceability, Arousal, Valence, Tempo.

Conclusion / Discussion

I found the making of this dashboard extremely informative and fascinating. Through the development of various visualizations concerning the 4 moments of musical sound (pitch, volume, timbre, and duration) I gained a lot of insight into different characteristics of music and learned how sound can be visualized and expressed in an explorable manner. I also found it very insightful to receive weekly feedback into my portfolio. Being able to view fellow students’ perspective in a course with a diverse range of academic backgrounds was really interesting. I incorporated several of the comments and suggestions that I received and was able to learn a lot about the computational side of music analysis.

I think that working on this portfolio has deepened my appreciation for the complexity of music and the depth of computational analysis. By using chromograms, cepstrograms, self-similarity charts, keygrams, chordograms, tempograms, histograms, and energy novelty charts, I was able to break down AI- and human-generated tracks into quantifiable elements.

One of the most interesting takeaways from this portfolio was seeing how the overall class corpus compares, seeing the similarities and differences in students’ songs in terms of track-level and corpus-level features like valence, tempo, pitch distributions, loudness dynamics, timbre characteristics, and rhythmic structures. I also found that the classification analysis revealed fascinating insights into how different musical features contribute to genre perception and categorization as AI or human generated. I am generally really intrigued by the global AI shift and I am glad I was able to explore this evolution from the musical aspect.